1 Research

1.1 Research Objectives

The primary aim is to develop and evaluate a predictive model to identify key factors influencing students’ mathematics achievement. The specific objectives are:

  1. Examine Relationships: To investigate the relationships between socio-economic, environmental, school-related, and behavioral factors and students’ mathematics achievement across varying proficiency levels.

  2. Develop Predictive Model: To construct a predictive model that quantifies the differential impact of these factors on students’ mathematics achievement.

  3. Evaluate Model Performance: To assess the classification performance of the predictive model, focusing on its accuracy and ability to differentiate between levels of mathematical proficiency.

  4. Interpret Key Factors: To identify and interpret the most influential factors from the predictive model and discuss their practical implications for educators and policymakers aiming to enhance students’ mathematics achievement.

1.2 Research Questions

This study addresses the following research questions:

  1. How are socio-economic, environmental, school-related, and behavioral factors related to students’ mathematics achievement across different proficiency levels?

  2. Which specific factors significantly predict mathematics achievement at various proficiency levels?

  3. How accurate is the predictive model in classifying students’ mathematics achievement?

  4. What patterns of misclassification (e.g., confusion between proficiency levels) are revealed by the confusion matrix, and what do these errors indicate about the model’s limitations?

  5. What are the most influential factors identified by the predictive model, and what actionable insights do they provide for designing interventions or policies to improve mathematics achievement across proficiency levels?

2 Data Preparation and Quality Assessment

## Dataset Overview:
## Total observations: 358
## Total variables: 29
## No missing data detected in the cleaned dataset.

3 Exploratory Analysis

3.1 Target Distribution

Table 1: Distribution of Students and Average Class Sizes by Mathematics Proficiency Level
Student Distribution
Classroom Metrics
proficiency_level Count Percentage Average Class Size
Low Proficiency 25 7 43.4
Moderate Proficiency 111 31 40.9
High Proficiency 222 62 37.5

High Proficiency (62%, n=222) was most common, followed by Moderate (31%, n=111) and Low Proficiency (7%, n=25). Average class sizes were smaller for High Proficiency (37.5) than Low (43.4).

Observation: The skewed distribution toward High Proficiency (62%) suggests a dataset with a majority of high-performing students, which may have influenced the model’s stronger performance in classifying Good and Average levels. The trend of smaller class sizes correlating with higher proficiency (37.5 vs. 43.4) aligns with educational research suggesting that smaller classes allow for more individualized attention, potentially enhancing performance.

3.2 Demographic Characteristics

Table 2: Gender Distribution by Mathematics Proficiency Level
Count
Percentage
Proficiency Level Female (n) Male (n) Female (%) Male (%)
Low Proficiency 18 7 72.0 28.0
Moderate Proficiency 77 34 69.4 30.6
High Proficiency 130 92 58.6 41.4
Statistical Test:
Chi-square test: χ² = 4.67, df = 2, p = 0.097

Gender distribution showed females were more prevalent across all proficiency levels (e.g., 72% in Low Proficiency). Chi-square test (p=0.097) indicated no significant gender-performance association

Observation: The higher proportion of females, particularly in Low Proficiency (72%), is intriguing and may reflect sampling characteristics or gender-specific engagement with mathematics. The non-significant p-value (0.097) suggests gender alone does not strongly predict performance, but the trend toward more males in High Proficiency (41.4%) warrants further exploration of potential gender-related behavioral or environmental factors.

3.3 Students’ Participation in Activities

Distribution of Students’ Participation in Activities
Activity Count Percentage (%)
Sports 67 18.7
Quiz competition 8 2.2
Music 6 1.7
Chaplaincy 3 0.8
Band Rehearsal 2 0.6
Computer Coding 2 0.6
Debate 2 0.6
Trading 2 0.6
Art 1 0.3
Dance club 1 0.3
Football 1 0.3
Karate 1 0.3
Reading club 1 0.3
Selling 1 0.3
Sports, music 1 0.3
None 259 72.3

Key Finding:

High Non-Participation: 72.3% (n=259) of students reported no extracurricular activity (“None”), indicating limited access or engagement in structured programs.

Sports Dominance: Sports was the most common activity (18.7%, n=67), followed by quiz competition (2.2%, n=8) and music (1.7%, n=6). Other activities (e.g., coding, debate) had minimal participation (≤0.6%).

Statistical Significance: Activity participation was significantly associated with mathematics proficiency (p=0.0084), with a likely small effect size. The negative impact of activities was also significant (p=0.0235), with 77.5% of High Proficiency students reporting no negative impact vs. 52% for Low Proficiency.

Model Relevance: The activity variable was excluded from the core predictive model, suggesting a secondary role compared to primary predictors like math enjoyment and family support.

Observations:Limited extracurricular engagement, particularly in cognitive activities (e.g., coding, debate), may reflect resource constraints or low awareness. Sports participation may foster discipline, indirectly supporting academic performance, but its impact is diluted by low diversity in activities. The high “None” response rate and small sample sizes for specific activities limit the ability to assess their full impact on mathematics achievement.

Expand diverse extracurricular programs, especially cognitive ones, to engage the 72.3% non-participating students. Address barriers (e.g., funding, facilities) to ensure equitable access, particularly for low-income students. Encourage structured activities that complement academic goals without negative impacts.

3.4 Socio-Economic Factors Analysis

Socio-Economic Factors by Mathematics Proficiency Level
Percentage distribution of students within proficiency categories
Factor Proficiency Level Category Percentage (%)
Family Support Low Proficiency Fair 32.0
Family Support Low Proficiency Good 44.0
Family Support Low Proficiency Poor 24.0
Family Support Moderate Proficiency Fair 28.8
Family Support Moderate Proficiency Good 65.8
Family Support Moderate Proficiency Poor 5.4
Family Support High Proficiency Fair 20.7
Family Support High Proficiency Good 77.0
Family Support High Proficiency Poor 2.3
Feeding Frequency Low Proficiency Once 4.0
Feeding Frequency Low Proficiency Thrice Or More 48.0
Feeding Frequency Low Proficiency Twice 48.0
Feeding Frequency Moderate Proficiency Once 9.9
Feeding Frequency Moderate Proficiency Thrice Or More 52.3
Feeding Frequency Moderate Proficiency Twice 37.8
Feeding Frequency High Proficiency Once 5.9
Feeding Frequency High Proficiency Thrice Or More 64.9
Feeding Frequency High Proficiency Twice 29.3
Financial Status Low Proficiency High Income 12.0
Financial Status Low Proficiency Low Income 20.0
Financial Status Low Proficiency Middle Income 68.0
Financial Status Moderate Proficiency High Income 18.9
Financial Status Moderate Proficiency Low Income 14.4
Financial Status Moderate Proficiency Middle Income 66.7
Financial Status High Proficiency High Income 27.0
Financial Status High Proficiency Low Income 7.7
Financial Status High Proficiency Middle Income 65.3
Pocket Money Low Proficiency 11.00 + 20.0
Pocket Money Low Proficiency 2.00 - 5.00 36.0
Pocket Money Low Proficiency 6.00 - 10.00 44.0
Pocket Money Moderate Proficiency 11.00 + 19.8
Pocket Money Moderate Proficiency 2.00 - 5.00 35.1
Pocket Money Moderate Proficiency 6.00 - 10.00 45.0
Pocket Money High Proficiency 11.00 + 32.9
Pocket Money High Proficiency 2.00 - 5.00 19.8
Pocket Money High Proficiency 6.00 - 10.00 47.3

Family Support: High Proficiency students reported stronger support (77% Good vs. 44% for Low Proficiency).

Feeding Frequency: High Proficiency students had higher thrice-or-more daily meals (64.9% vs. 48% for Low).

Financial Status: High Proficiency students had slightly higher High Income representation (27% vs. 12% for Low).

Pocket Money: High Proficiency students received more pocket money (32.9% at 11.00+ vs. 20% for Low).

Observation: The strong correlation between family support and proficiency highlights the critical role of home environment in academic success, with only 2.3% of High Proficiency students reporting poor support compared to 24% in Low Proficiency. The feeding frequency trend suggests nutritional stability may contribute to cognitive performance. The prevalence of middle-income families across all levels indicates that financial status alone may not differentiate performance as much as support quality. Higher pocket money in High Proficiency could reflect greater access to resources or parental investment.

3.5 Educational Environment

Environmental Factors by Mathematics Proficiency Level
Percentage distribution of students within proficiency categories
Factor Proficiency Level Category Percentage (%)
Conducive Classroom Low Proficiency Maybe 20.0
Conducive Classroom Low Proficiency No 24.0
Conducive Classroom Low Proficiency Yes 56.0
Conducive Classroom Moderate Proficiency Maybe 23.4
Conducive Classroom Moderate Proficiency No 26.1
Conducive Classroom Moderate Proficiency Yes 50.5
Conducive Classroom High Proficiency Maybe 13.5
Conducive Classroom High Proficiency No 23.0
Conducive Classroom High Proficiency Yes 63.5
Distance from School Low Proficiency Close 20.0
Distance from School Low Proficiency Far 48.0
Distance from School Low Proficiency Very Close 8.0
Distance from School Low Proficiency Very Far 24.0
Distance from School Moderate Proficiency Close 21.6
Distance from School Moderate Proficiency Far 54.1
Distance from School Moderate Proficiency Very Close 5.4
Distance from School Moderate Proficiency Very Far 18.9
Distance from School High Proficiency Close 24.3
Distance from School High Proficiency Far 53.2
Distance from School High Proficiency Very Close 7.7
Distance from School High Proficiency Very Far 14.9
Mode of Transportation Low Proficiency By Commercial Vehicle 60.0
Mode of Transportation Low Proficiency By Foot 24.0
Mode of Transportation Low Proficiency By Guardian's Car 16.0
Mode of Transportation Moderate Proficiency By Commercial Vehicle 48.6
Mode of Transportation Moderate Proficiency By Foot 31.5
Mode of Transportation Moderate Proficiency By Guardian's Car 19.8
Mode of Transportation High Proficiency By Commercial Vehicle 36.9
Mode of Transportation High Proficiency By Foot 34.2
Mode of Transportation High Proficiency By Guardian's Car 28.8
Negative Home Environment Low Proficiency Maybe 32.0
Negative Home Environment Low Proficiency No 40.0
Negative Home Environment Low Proficiency Yes 28.0
Negative Home Environment Moderate Proficiency Maybe 17.1
Negative Home Environment Moderate Proficiency No 70.3
Negative Home Environment Moderate Proficiency Yes 12.6
Negative Home Environment High Proficiency Maybe 14.4
Negative Home Environment High Proficiency No 75.7
Negative Home Environment High Proficiency Yes 9.9
Proper Ventilation Low Proficiency Maybe 12.0
Proper Ventilation Low Proficiency No 28.0
Proper Ventilation Low Proficiency Yes 60.0
Proper Ventilation Moderate Proficiency Maybe 15.3
Proper Ventilation Moderate Proficiency No 18.0
Proper Ventilation Moderate Proficiency Yes 66.7
Proper Ventilation High Proficiency Maybe 11.3
Proper Ventilation High Proficiency No 16.2
Proper Ventilation High Proficiency Yes 72.5
Transportation Fare Low Proficiency 11.00 + 8.0
Transportation Fare Low Proficiency 2.00 - 5.00 40.0
Transportation Fare Low Proficiency 6.00 - 10.00 8.0
Transportation Fare Low Proficiency None 44.0
Transportation Fare Moderate Proficiency 11.00 + 5.4
Transportation Fare Moderate Proficiency 2.00 - 5.00 28.8
Transportation Fare Moderate Proficiency 6.00 - 10.00 14.4
Transportation Fare Moderate Proficiency None 51.4
Transportation Fare High Proficiency 11.00 + 8.1
Transportation Fare High Proficiency 2.00 - 5.00 16.2
Transportation Fare High Proficiency 6.00 - 10.00 14.9
Transportation Fare High Proficiency None 60.8

Finding: Conducive classrooms (63.5% “Yes” in High vs. 56% in Low), better ventilation (72.5% “Yes” in High), and less negative home environments (9.9% “Yes” in High vs. 28% in Low) were linked to higher proficiency. Distance and transportation showed no clear trends.

Observation: The higher prevalence of conducive classrooms and proper ventilation in High Proficiency suggests that physical learning environments significantly influence outcomes, potentially by reducing distractions and improving focus. The marked decrease in negative home environments (from 28% to 9.9%) underscores the protective effect of stable home settings. The lack of significant trends in distance and transportation (e.g., 48-54% “Far” across levels) suggests these factors may be less critical in this context, possibly due to adequate access to transport options.

3.7 Student Behavioral Factors

Behavioral Factors by Mathematics Proficiency Level
Percentage distribution of behavioral responses
Proficiency Level Behavioral Factor Response Category n Percentage (%)
Low Proficiency Extra Classes No 14 56.0
Low Proficiency Extra Classes Yes 11 44.0
Moderate Proficiency Extra Classes No 62 55.9
Moderate Proficiency Extra Classes Yes 49 44.1
High Proficiency Extra Classes No 92 41.4
High Proficiency Extra Classes Yes 130 58.6
Low Proficiency Homework Frequency Irregularly 7 28.0
Low Proficiency Homework Frequency Not at all 5 20.0
Low Proficiency Homework Frequency Regularly 13 52.0
Moderate Proficiency Homework Frequency Irregularly 21 18.9
Moderate Proficiency Homework Frequency Not at all 7 6.3
Moderate Proficiency Homework Frequency Regularly 83 74.8
High Proficiency Homework Frequency Irregularly 16 7.2
High Proficiency Homework Frequency Not at all 7 3.2
High Proficiency Homework Frequency Regularly 199 89.6
Low Proficiency Math Enjoyment Maybe 5 20.0
Low Proficiency Math Enjoyment No 10 40.0
Low Proficiency Math Enjoyment Yes 10 40.0
Moderate Proficiency Math Enjoyment Maybe 43 38.7
Moderate Proficiency Math Enjoyment No 16 14.4
Moderate Proficiency Math Enjoyment Yes 52 46.8
High Proficiency Math Enjoyment Maybe 19 8.6
High Proficiency Math Enjoyment No 16 7.2
High Proficiency Math Enjoyment Yes 187 84.2
Low Proficiency Negative Impact of Activity No 8 32.0
Low Proficiency Negative Impact of Activity None 13 52.0
Low Proficiency Negative Impact of Activity Yes 4 16.0
Moderate Proficiency Negative Impact of Activity Maybe 5 4.5
Moderate Proficiency Negative Impact of Activity No 12 10.8
Moderate Proficiency Negative Impact of Activity None 78 70.3
Moderate Proficiency Negative Impact of Activity Yes 16 14.4
High Proficiency Negative Impact of Activity Maybe 6 2.7
High Proficiency Negative Impact of Activity No 17 7.7
High Proficiency Negative Impact of Activity None 172 77.5
High Proficiency Negative Impact of Activity Yes 27 12.2
Low Proficiency School Attendance Not at all 8 32.0
Low Proficiency School Attendance Sometimes 17 68.0
Moderate Proficiency School Attendance Not at all 47 42.3
Moderate Proficiency School Attendance Often 11 9.9
Moderate Proficiency School Attendance Sometimes 53 47.7
High Proficiency School Attendance Not at all 130 58.6
High Proficiency School Attendance Often 12 5.4
High Proficiency School Attendance Sometimes 80 36.0
Low Proficiency Study Hours 1 hour 10 40.0
Low Proficiency Study Hours 2 hours and above 7 28.0
Low Proficiency Study Hours Less than an hour 8 32.0
Moderate Proficiency Study Hours 1 hour 58 52.3
Moderate Proficiency Study Hours 2 hours and above 24 21.6
Moderate Proficiency Study Hours Less than an hour 29 26.1
High Proficiency Study Hours 1 hour 94 42.3
High Proficiency Study Hours 2 hours and above 93 41.9
High Proficiency Study Hours Less than an hour 35 15.8

Finding: High Proficiency students showed higher extra class attendance (58.6% vs. 44% in Low), regular homework (89.6% vs. 52%), math enjoyment (84.2% vs. 40%), better attendance (58.6% “Not at all” absent), and more study hours (41.9% ≥2 hours vs. 28%).

Observation: The stark contrast in math enjoyment (84.2% vs. 40%) highlights its role as a motivational driver, likely encouraging sustained effort. The high rate of regular homework in High Proficiency (89.6%) reflects disciplined study habits, which are critical for mastery. The lower absence rates and increased study hours among high performers suggest that consistent engagement with school and learning activities is a key differentiator, reinforcing the importance of behavioral factors.

3.8 Class Size Analysis

Table 3: Class Size Distribution by Proficiency Level
proficiency_level N Mean SD Median IQR
Low Proficiency 25 43.4 15.4 38 33.0
Moderate Proficiency 111 40.9 14.4 38 29.5
High Proficiency 222 37.5 13.9 35 25.8

Finding: Smaller class sizes correlated with higher proficiency (mean: 37.5 in High vs. 43.4 in Low).

Observation: The consistent decrease in class size from Low (43.4) to High (37.5) proficiency aligns with the hypothesis that smaller classes facilitate better teacher-student interaction and personalized instruction. The relatively large standard deviations (13.9–15.4) indicate variability in class sizes, suggesting that some high-performing students succeed in larger classes, possibly due to other supportive factors like family support or personal motivation.

4 Statistical Association Tests

4.1 Chi-Square/Fisher’s Exact Tests

Table 4: Association Tests for Factors and Proficiency
Variable Test Used χ² df p-value Cramer’s V Effect Size Significant
…1 homework_frequency Fisher’s Exact Test 0.0000 Yes
…2 math_enjoyment Fisher’s Exact Test 0.0000 Yes
…3 family_support Fisher’s Exact Test 0.0001 Yes
…4 availability_of_learning_materials Fisher’s Exact Test 0.0013 Yes
X-squared…5 study_hours Chi-square Test 16.728 4 0.0022 0.153 Small Yes
…6 do_you_miss_school Fisher’s Exact Test 0.0028 Yes
…7 negative_impact_of_home_environment Fisher’s Exact Test 0.0064 Yes
…8 activity Fisher’s Exact Test 0.0084 Yes
X-squared…9 pocket_money Chi-square Test 13.104 4 0.0108 0.135 Small Yes
…10 negative_impact_of_activity Fisher’s Exact Test 0.0235 Yes
…11 teachers_ratings Fisher’s Exact Test 0.0289 Yes
X-squared…12 extra_classes Chi-square Test 7.062 2 0.0293 0.14 Small Yes
X-squared…13 extra_curricular_activities Chi-square Test 6.838 2 0.0327 0.138 Small Yes
…14 quality_of_teaching Fisher’s Exact Test 0.0348 Yes
…15 teacher_student_relation Fisher’s Exact Test 0.0410 Yes
…16 transportation_fare Fisher’s Exact Test 0.0442 Yes
…17 family_financial_status Fisher’s Exact Test 0.0547 No
X-squared…18 mode_of_transportation Chi-square Test 8.498 4 0.0750 0.109 Small No
…19 feeding Fisher’s Exact Test 0.0875 No
…20 conducive_classroom Fisher’s Exact Test 0.1314 No
…21 parents_occupation_grouped Fisher’s Exact Test 0.1839 No
…22 proper_ventilation_in_class Fisher’s Exact Test 0.4510 No
…23 distance_of_school_from_home Fisher’s Exact Test 0.8312 No

Finding: Significant factors (p<0.05) included homework frequency, math enjoyment, family support, learning materials, study hours, school attendance, negative home environment, activity, pocket money, teacher ratings, extra classes, extracurricular activities, teaching quality, teacher-student relation, and transportation fare. Non-significant factors included financial status, transportation mode, feeding, conducive classroom, parents’ occupation, ventilation, and distance.

Behavioral: Math enjoyment, homework frequency, study hours, school attendance (p<0.01).

Socio-Economic: Family support, pocket money (p<0.05).

School-Related: Teachers’ ratings, extra classes, teaching quality, teacher-student relations (p<0.05).

Environmental: Negative home environment, activity impact (p<0.05).

Observation: The large number of significant factors (15 out of 23 tested) underscores the multifaceted nature of mathematics performance, with behavioral and socio-economic factors dominating. The non-significance of financial status (p=0.0547) and feeding (p=0.0875) is surprising, given their descriptive trends, suggesting that their impact may be mediated by other factors like family support. The small effect sizes (Cramer’s V ~0.109–0.153) indicate that while associations exist, individual factors have modest predictive power, necessitating a multivariate approach like logistic regression.

5 Developing a Predictive Multinomial Logistic Regression Model

5.1 Data preparation

## Final modeling dataset dimensions (rows, columns): 358 22
## Missing values per variable:
##                    math_performance                      math_enjoyment 
##                                   0                                   0 
##                  homework_frequency                      family_support 
##                                   0                                   0 
##  availability_of_learning_materials                         study_hours 
##                                   0                                   0 
##                  do_you_miss_school negative_impact_of_home_environment 
##                                   0                                   0 
##                            activity                        pocket_money 
##                                   0                                   0 
##         negative_impact_of_activity                    teachers_ratings 
##                                   0                                   0 
##                       extra_classes         extra_curricular_activities 
##                                   0                                   0 
##                 quality_of_teaching            teacher_student_relation 
##                                   0                                   0 
##                 transportation_fare                      number_on_roll 
##                                   0                                   0 
##                                 age                              gender 
##                                   0                                   0 
##             family_financial_status                             feeding 
##                                   0                                   0
## Training set size: 252
## Test set size: 106
Table: Distribution of Mathematics Performance Levels in Training and Test Sets
Performance Level Train Count Train Proportion Test Count Test Proportion
Poor 18 0.071 7 0.066
Average 78 0.310 33 0.311
Good 96 0.381 41 0.387
Excellent 60 0.238 25 0.236

Finding: The final dataset included 22 variables, split into 252 training and 106 test observations.

Observation: The reduction to 22 variables from 29 reflects careful feature selection to avoid overfitting given the moderate sample size. The balanced split (e.g., Poor: 7.1% in training vs. 6.6% in testing) ensured robust model evaluation.

6 Multinomial Logistic Regression

6.0.1 Model Development and Selection

7 Seed (500)

best (777)
Table 4: Model Comparison and Selection Criteria
Model Variables AIC BIC AIC_Weight
Full Model 39 670.0 1083.0 0.000
Reduced Model 20 621.1 832.9 0.006
Core Model 11 611.0 738.1 0.994
## 
## Selected Model: Core Model
## Predictors: math_enjoyment, homework_frequency, family_support, gender, availability_of_learning_materials, study_hours, number_on_roll

Finding: The Core Model (11 predictors: math enjoyment, homework frequency, family support, gender, learning materials, study hours, class size) was selected based on lowest AIC (611.0) and BIC (738.1).

Observation: The Core Model’s selection over the Full (39 variables) and Reduced (20 variables) models indicates a preference for parsimony, balancing explanatory power and complexity. The inclusion of only 11 predictors suggests that these capture the most variance, with others (e.g., transportation fare) excluded due to weaker contributions.

7.0.1 Model Diagnostics

Table 5: Variance Inflation Factors for Core Model
Variable GVIF Df Adjusted_VIF Interpretation
family_financial_status 1.893 2 1.173 No multicollinearity
teacher_student_relation 2.133 3 1.135 No multicollinearity
quality_of_teaching 1.634 2 1.131 No multicollinearity
pocket_money 1.630 2 1.130 No multicollinearity
teachers_ratings 1.963 3 1.119 No multicollinearity
feeding 1.562 2 1.118 No multicollinearity
family_support 1.558 2 1.117 No multicollinearity
availability_of_learning_materials 1.219 1 1.104 No multicollinearity
study_hours 1.355 2 1.079 No multicollinearity
extra_classes 1.143 1 1.069 No multicollinearity
homework_frequency 1.279 2 1.064 No multicollinearity
age 1.150 2 1.035 No multicollinearity
gender 1.048 1 1.024 No multicollinearity

Finding: No multicollinearity (Adjusted VIF <1.173 for all variables).

Observation: The low VIF values confirm that predictors like family support and teacher-student relations are independent, enhancing confidence in the model’s coefficient estimates. This rigorous check strengthens the validity of the findings, as multicollinearity could have distorted the importance of key factors.

7.0.2 Model Coefficients and Odds Ratios

Table 6: Multinomial Logistic Regression Coefficients and Odds Ratios (Core Model)
Performance_Level Variable Coefficient Std Error z-value p-value Sig OR [95% CI]
Average vs Poor
Average math_enjoymentMaybe 1.9595183 0.8991370 2.1793323 0.029
7.10 [1.22, 41.34]
Average math_enjoymentYes 1.4090836 0.7283555 1.9346097 0.053 4.09 [0.98, 17.06]
Average homework_frequency.L 0.9723862 0.8215724 1.1835672 0.237 2.64 [0.53, 13.23]
Average homework_frequency.Q 0.1375024 0.6977460 0.1970665 0.844 1.15 [0.29, 4.50]
Average family_support.L 0.9509576 0.6416735 1.4819959 0.138 2.59 [0.74, 9.10]
Average family_support.Q -0.6098070 0.5861229 -1.0404082 0.298 0.54 [0.17, 1.71]
Average genderMale -0.3136964 0.6315993 -0.4966700 0.619 0.73 [0.21, 2.52]
Average availability_of_learning_materialsYes 0.1231724 0.6669545 0.1846788 0.853 1.13 [0.31, 4.18]
Average study_hours.L -0.2366739 0.5526153 -0.4282797 0.668 0.79 [0.27, 2.33]
Average study_hours.Q -0.3496534 0.4927998 -0.7095243 0.478 0.70 [0.27, 1.85]
Average number_on_roll -0.0050135 0.0205998 -0.2433743 0.808 0.99 [0.96, 1.04]
Good vs Poor
Good math_enjoymentMaybe 1.1259041 0.9361611 1.2026820 0.229 3.08 [0.49, 19.31]
Good math_enjoymentYes 1.8401924 0.7333014 2.5094625 0.012
6.30 [1.50, 26.51]
Good homework_frequency.L 0.3172617 0.7868300 0.4032151 0.687 1.37 [0.29, 6.42]
Good homework_frequency.Q 1.1997631 0.7203663 1.6654903 0.096 3.32 [0.81, 13.62]
Good family_support.L 2.1179365 0.8921877 2.3738686 0.018
8.31 [1.45, 47.78]
Good family_support.Q -1.3014206 0.6898759 -1.8864561 0.059 0.27 [0.07, 1.05]
Good genderMale 0.1404406 0.6272989 0.2238815 0.823 1.15 [0.34, 3.94]
Good availability_of_learning_materialsYes 0.3165066 0.6830583 0.4633669 0.643 1.37 [0.36, 5.23]
Good study_hours.L 0.4168687 0.5553202 0.7506816 0.453 1.52 [0.51, 4.51]
Good study_hours.Q -0.1674887 0.4980079 -0.3363173 0.737 0.85 [0.32, 2.24]
Good number_on_roll -0.0114041 0.0210349 -0.5421507 0.588 0.99 [0.95, 1.03]
Excellent vs Poor
Excellent math_enjoymentMaybe 0.6259619 1.6629056 0.3764266 0.707 1.87 [0.07, 48.68]
Excellent math_enjoymentYes 3.5939467 1.2153002 2.9572501 0.003 ** 36.38 [3.36, 393.84]
Excellent homework_frequency.L 0.9423822 1.1108253 0.8483622 0.396 2.57 [0.29, 22.64]
Excellent homework_frequency.Q 1.3732998 0.9558408 1.4367454 0.151 3.95 [0.61, 25.71]
Excellent family_support.L 1.2721617 0.7765326 1.6382593 0.101 3.57 [0.78, 16.35]
Excellent family_support.Q 0.1149641 0.7367832 0.1560353 0.876 1.12 [0.26, 4.75]
Excellent genderMale 0.7682743 0.6686305 1.1490268 0.251 2.16 [0.58, 7.99]
Excellent availability_of_learning_materialsYes 0.1429419 0.7636155 0.1871910 0.852 1.15 [0.26, 5.15]
Excellent study_hours.L 0.3248669 0.5902947 0.5503470 0.582 1.38 [0.44, 4.40]
Excellent study_hours.Q 0.4197273 0.5390165 0.7786909 0.436 1.52 [0.53, 4.38]
Excellent number_on_roll -0.0219989 0.0231904 -0.9486207 0.343 0.98 [0.93, 1.02]

Finding: Math enjoyment was the strongest predictor (OR=36.38 for Excellent vs. Poor, p=0.003), followed by family support (OR=8.31 for Good vs. Poor, p=0.018). Homework frequency had a moderate effect (OR=3.95 for Excellent vs. Poor, p=0.151). Gender, learning materials, study hours, and class size were not significant (p>0.05).

Observation: The exceptionally high odds ratio for math enjoyment (36.38) underscores its pivotal role, particularly for top performers, suggesting that intrinsic motivation is a powerful driver. Family support’s significant effect for Good vs. Poor (OR=8.31) highlights its role in moderate-to-high performance transitions. The non-significance of gender and class size (p>0.05) is notable, given descriptive trends, possibly due to their effects being overshadowed by behavioral factors.

7.1 Model Evaluation

7.1.1 Confusion Matrix

Table 8: Confusion Matrix (Test Set)
Poor
Average
Good
Excellent
Actual Performance Level
Prediction Freq_Poor Freq_Average Freq_Good Freq_Excellent Row_Percentage_Poor Row_Percentage_Average Row_Percentage_Good Row_Percentage_Excellent Total
Poor 2 3 0 0 28.6 9.1 0.0 0.0 5
Average 2 19 5 1 28.6 57.6 12.2 4.0 27
Good 2 10 32 18 28.6 30.3 78.0 72.0 62
Excellent 1 1 4 6 14.3 3.0 9.8 24.0 12
Actual Total 7 33 41 25 6.6 31.1 38.7 23.6 106
Note:
Accuracy: 55.7%; Kappa: 0.333

Finding: Accuracy 55.7%, Kappa 0.333 (moderate agreement). High misclassification in Poor (28.6% correct) and Excellent (24%) levels; better for Good (78%) and Average (57.6%). AUC ranged from 0.693 (Good) to 0.789 (Average).

Observation: The moderate accuracy (55.7%) reflects challenges in predicting extreme categories (Poor and Excellent), likely due to their smaller sample sizes (7 and 25, respectively). The higher accuracy for Good and Average levels aligns with their larger representation in the dataset. The AUC values indicate reasonable discriminative ability, particularly for Average proficiency, but suggest room for improvement in distinguishing Poor and Excellent students. The moderate Kappa (0.333) indicates fair agreement beyond chance, highlighting the model’s practical utility despite limitations.

7.2 ROC Curves

Table 9: AUC Analysis (Test Set)
Performance_Level AUC
Poor Poor 0.722
Average Average 0.789
Good Good 0.693
Excellent Excellent 0.739

AUC values: Average (0.789), Excellent (0.739), Poor (0.722), Good (0.693). Fair to good discriminatory ability, with Average performing best.

7.2.1 Classification Metrics

Table 10: Classification Metrics by Performance Level (Test Set)
Performance_Level Sensitivity Specificity Precision F1_Score Balanced_Accuracy
Class: Poor Poor 0.286 0.970 0.400 0.333 0.628
Class: Average Average 0.576 0.890 0.704 0.633 0.733
Class: Good Good 0.780 0.538 0.516 0.621 0.659
Class: Excellent Excellent 0.240 0.926 0.500 0.324 0.583

Average: High sensitivity (0.576), specificity (0.890), F1-score (0.633). Good: High sensitivity (0.780) but low specificity (0.538). Poor and Excellent: Low sensitivity (0.286 and 0.240), indicating classification challenges.

8 Key Factors and Practical Implications

Table 11: Top Influential Factors in Mathematics Performance
Factor_Name Factor_Domain Avg_Z_Value Min_P_Value Significance_Rating Importance_Score
Critical Factors
Math Enjoyment Behavioral 2.47 0.0031 Moderate (2 Levels) 6.6
Family Support Socio-Economic 1.83 0.0176 Low (1 Level) 2.4
Math Enjoyment Behavioral 1.25 0.0293 Low (1 Level) 1.7
Learning Materials Educational 0.28 0.6431 Not Significant 0.0
Family Support Socio-Economic 1.03 0.0592 Not Significant 0.0
Important Factors
Gender Demographic 0.62 0.2505 Not Significant 0.0
Homework Frequency Behavioral 0.81 0.2366 Not Significant 0.0
Homework Frequency Behavioral 1.10 0.0958 Not Significant 0.0
Number On Roll Other 0.58 0.3428 Not Significant 0.0
Study Hours Behavioral 0.58 0.4528 Not Significant 0.0
Note:
Importance Score = (Avg |z-value| × (4 - Min p-value)) × (Significant Levels / 3)
Table 12: Practical Effect Sizes and Interpretations
Factor_Name Min_OR Max_OR Avg_OR Effect_Direction Effect_Magnitude Practical_Interpretation
Math Enjoyment 4.09 36.38 15.59 Positive Moderate Students who enjoy mathematics are significantly more likely to achieve higher performance
Family Support 4.09 36.38 15.59 Positive Moderate Strong family support significantly enhances mathematics achievement
Math Enjoyment 4.09 36.38 15.59 Positive Moderate Students who enjoy mathematics are significantly more likely to achieve higher performance
Learning Materials 4.09 36.38 15.59 Positive Moderate Access to learning materials is essential for academic achievement
Family Support 4.09 36.38 15.59 Positive Moderate Strong family support significantly enhances mathematics achievement
Gender 4.09 36.38 15.59 Positive Moderate Factor shows significant association with mathematics performance
Homework Frequency 4.09 36.38 15.59 Positive Moderate Regular study habits and homework completion are critical for academic success
Homework Frequency 4.09 36.38 15.59 Positive Moderate Regular study habits and homework completion are critical for academic success
Number On Roll 4.09 36.38 15.59 Positive Moderate Factor shows significant association with mathematics performance
Study Hours 4.09 36.38 15.59 Positive Moderate Regular study habits and homework completion are critical for academic success
Top 10 Predictors of Mathematics Performance Ranked by Weighted |z-Statistic|
Predictor Importance Score
Math Enjoyment 2.459
Family Support 1.799
Math Enjoyment 1.216
Homework Frequency 0.994
Family Support 0.967
Homework Frequency 0.620
Gender 0.467
Number On Roll 0.380
study_hours.Q 0.343
Study Hours 0.315

Top Predictors:

Finding: Math Enjoyment (Score: 2.459), Family Support (1.799), Homework Frequency (0.994) were the top predictors. Gender, learning materials, study hours, and class size were less impactful.

Observation: The dominance of math enjoyment and family support aligns with their significant odds ratios, reinforcing their critical role in driving performance. Homework frequency’s importance reflects the value of consistent effort, though its non-significant p-values (e.g., 0.2366) suggest a supportive rather than primary role. The low impact of gender and class size may indicate that their effects are context-specific or mediated by other factors like motivation or resources.

Practical Implications:

Finding: Foster math enjoyment through engaging teaching, strengthen family support programs, promote regular homework, ensure access to learning materials, and consider smaller class sizes.

Observation: The emphasis on math enjoyment suggests that interventions like interactive math curricula or gamification could boost engagement, particularly for low performers. Family support programs, such as parent workshops, could address the 24% of Low Proficiency students with poor support. The focus on homework frequency highlights the need for structured assignments and monitoring to build habits. The moderate effect of learning materials underscores the importance of equitable resource distribution, especially in resource-constrained settings. While class size was less significant in the model, the descriptive trend (37.5 vs. 43.4) supports advocating for smaller classes where feasible.

9 Conclusion

Finding: The multinomial logistic regression model identified math enjoyment, family support, and homework frequency as key predictors of mathematics performance, with moderate accuracy (55.7%) and AUC (0.693–0.789). Practical implications focus on fostering positive attitudes, family engagement, and consistent study habits.

Observation: The model’s success in identifying key behavioral and socio-economic factors provides a strong foundation for targeted interventions, particularly for improving motivation and support systems. The lower accuracy for Poor and Excellent levels suggests that these groups may require specialized models or additional predictors (e.g., psychological or contextual factors). The study’s actionable insights, such as enhancing math enjoyment and family support, are particularly relevant for educators and policymakers aiming to address performance gaps. Limitations, such as misclassification errors and non-significant factors like gender, highlight the need for further research into contextual influences and larger sample sizes for extreme proficiency levels.

  • Actionable Insights:

Interventions should focus on fostering math enjoyment and family support to improve outcomes. Schools can implement structured homework policies and provide learning resources. Misclassification patterns suggest targeted support for low-performing students to prevent overestimation of their proficiency.

10 Analysis discussion

This study utilized a multinomial logistic regression framework to predict mathematics achievement among 358 Junior High School students in Oforikrom municipality, Ghana, addressing four research objectives: examining factor relationships, developing a predictive model, evaluating its performance, and interpreting key predictors for practical interventions. The findings offer valuable insights into the determinants of mathematics performance and their implications for educational policy, while also revealing limitations that guide future research.

Relationships Between Factors and Performance (Objective 1, Question 1)

Chi-square and Fisher’s exact tests confirmed significant associations (p < 0.05) between mathematics achievement and behavioral factors (math enjoyment, homework frequency, study hours, school attendance), socio-economic factors (family support, pocket money), and school-related factors (teachers’ ratings, teaching quality, teacher-student relations). Notably, 84.2% of High Proficiency students reported enjoying mathematics, compared to 40% of Low Proficiency students, and 89.6% completed homework regularly, underscoring the role of student engagement. Family support was stronger among High Proficiency students (77% rated “Good” vs. 44% for Low), aligning with prior research that links parental involvement to academic success (Amoah et al., 2020). Environmental factors, such as negative home environment, also showed significant associations, though classroom conditions (e.g., ventilation, conduciveness) were less impactful. The small effect sizes (Cramer’s V < 0.2) suggest that while these associations are statistically significant, their practical impact may be moderated by other unmeasured factors.

Significant Predictors (Objective 2, Question 2)

The core multinomial logistic regression model, selected for its low AIC (611.0) and parsimony, identified math enjoyment and family support as the most significant predictors. Students who enjoyed mathematics had a 36.38 times higher odds of achieving Excellent versus Poor performance (p = 0.003), and those with strong family support had an 8.31 times higher odds of Good versus Poor performance (p = 0.018). Homework frequency showed moderate effects (e.g., OR = 3.95 for Excellent vs. Poor, p = 0.151), while other factors like gender, study hours, and class size had non-significant or small effects. These findings are consistent with educational theories emphasizing intrinsic motivation and socio-economic support as drivers of academic achievement (Ryan & Deci, 2000). The non-significance of class size (p = 0.343) contrasts with some literature (e.g., Hanushek, 1999), suggesting contextual factors in Ghanaian schools may dilute its impact.

Model Performance (Objective 3, Questions 3 and 4)

The model achieved a moderate accuracy of 55.7% (Kappa = 0.333), with strong classification performance for Good (78% sensitivity) and Average (57.6% sensitivity) levels but poor performance for Poor (28.6%) and Excellent (24%) levels. AUC values ranged from 0.693 (Good) to 0.789 (Average), indicating fair to good discriminatory ability. The confusion matrix revealed frequent misclassification of Poor students as Average or Good, suggesting the model overestimates performance for low achievers, possibly due to the small proportion of Poor students (7%). Similarly, Excellent students were often misclassified as Good, indicating challenges in distinguishing top performers. These patterns suggest that the model may benefit from additional predictors (e.g., psychological factors like self-efficacy) or techniques like oversampling to address class imbalance. The moderate accuracy and Kappa highlight the model’s utility for mid-range performers but underscore limitations for extreme categories.

Actionable Insights (Objective 4, Question 5)

The most influential factors—math enjoyment, family support, and homework frequency—offer clear pathways for interventions. Schools can foster math enjoyment through engaging, student-centered curricula, such as gamified learning or real-world applications, to enhance intrinsic motivation. Family support can be strengthened through parent engagement programs, such as workshops to equip parents with strategies to support learning. Regular homework completion, a consistent predictor, can be encouraged through structured routines and teacher feedback. Access to learning materials, though less significant in the model, remains critical, as 70% of High Proficiency students reported availability compared to 56% of Low Proficiency students. These interventions align with global evidence on effective educational strategies (Hattie, 2009). Policymakers should prioritize resource allocation to ensure equitable access to materials and professional development for teachers to enhance teaching quality, which showed moderate associations with performance.

Limitations and Future Research

The model’s moderate accuracy (55.7%) and low sensitivity for Poor and Excellent levels suggest limitations in capturing extreme performance categories, likely due to the imbalanced dataset (only 7% Poor). Future studies could employ advanced techniques like SMOTE (Synthetic Minority Oversampling Technique) or ensemble methods to improve classification. The non-significance of some factors (e.g., class size, gender) may reflect sample size constraints or contextual nuances, warranting larger, multi-region studies. Additionally, unmeasured psychological factors (e.g., math anxiety, self-efficacy) could enhance model performance and should be explored. Longitudinal designs could further clarify causal relationships, as this study’s cross-sectional nature limits causal inferences.